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Abstract 

In clinical trials, a covariate-adjusted response-adaptive (CARA) design allows 
a subject newly entering a trial a better chance of being allocated to a superior 
treatment regimen based on cumulative information from previous subjects, and 
adjusts the allocation according to individual covariate information. Since this 
design allocates subjects sequentially, it is natural to apply a sequential method for 
estimating the treatment effect in order to make the data analysis more efficient. 
In this paper, we study the sequential estimation of treatment effect for a general 
CARA design. A stopping criterion is proposed such that the estimates satisfy a 
prescribed precision when the sampling is stopped. The properties of estimates 
and stopping time are obtained under the proposed stopping rule. In addition, we 
show that the asymptotic properties of the allocation function, under the proposed 
stopping rule, are the same as those obtained in the non-sequential/fixed sample 
size counterpart. We then illustrate the performance of the proposed procedure with 
some simulation results using logistic models. The properties, such as the coverage 
probability of treatment effect, correct allocation proportion and average sample 
size, for diverse combinations of initial sample sizes and tuning parameters in the 
utility function are discussed. 

Key words: Covariate-adjustment, logistic regression, response-adaptive design, se- 
quential estimation, stopping time, targeted drug, utility function 



1. Introduction 

From an ethical viewpoint, it is desirable to minimize the number of subjects allocated to 
inferior treatments in the course of a clinical trial without jeopardizing the generation of 
useful and meaningful statistical inferences. The response adaptive (RA) design in clinical 



trials (Zelen and Wei 1995 and Hu and Rosenberger 2006) is dedicated to this purpose 



The advantage of an RA design is that the information collected from subjects previously 
entering the trial can be used to adjust the allocation probability so that a newly entering 
subject can have a better chance of being allocated to a superior treatment. Because 
of the sequential characteristic in this process, sequential statistical methods should be 



used in order to efficiently analyze these kinds of data sets. Since data collected in this 
manner are no longer independent, sequential methods that rely on assumption of inde- 
pendent observations are not valid. Moreover, due to innovation in genomic technologies 



and the nature of developing targeted drugs (Simon and Maitournam 2005), it is natu- 
ral to incorporate the information available on individual covariates that have a strong 
influence on responses to a model, since they may be associated with the efficacy of treat- 
ments. Hence, the existence of an interaction between treatment and covariate becomes 
a reasonable presumption as far as, for example, a targeted drug is concerned. 

A situation where there is an interaction between covariates and treatments is illus- 
trated in Figure 1. In this figure, a logistic model is used to describe the relation between 
responses to treatments and covariates, where the covariates are generated from two nor- 
mal distributions with a mean-shift denoting two sub-populations. Traditionally, we use 
an RA design by assuming there is no treatment-covariate interaction effect; that is, the 
slopes of treatments effects are assumed to be equal. However, when a treatment-covariate 
interaction exists, as in Figure 1, this assumption is not valid, and the lines A and B in 
this Figure will not be parallel. This implies that a method that uses RA design will 
make incorrect treatment allocation, when such a non-ignorable interaction exists. In this 
situation, it is reasonable to assume that a CARA design should perform better than an 
RA design in terms of correct allocation proportions. However, up until now, little work 
has been done on CARA designs. Since Figure 1 is for illustration purposes, it depicts an 
extreme example of two treatments with opposite slopes. However, as long as the slopes 
of the treatment effects are not equal, the two treatments make a lot of difference for sub- 
jects with covariates located far from the intersection of the lines of the treatment effects. 
Thus, as long as a targeted drug or other adaptive treatment strategy is being used, this 
situation should not be ignored. In addition to the ethical considerations, this is a further 
good reason for considering a CARA design. Further discussion about the properties 



of RA and CARA designs can be found in, Bandyopadhyay, Biswas, and Bhattacharya 



|(2007) , Bandyopadhyay and De (2009), Hu and Rosenberger (2006) and so on. 

Although the sequential characteristics of RA and CARA designs are clear, and the 
sequential sampling method, which allows the sample size to be determined based on 



3 



the observed information, is known to be an adequate choice for making efficient and 
valid statistical inference, most discussions in the literature to date have been limited to 
the asymptotic properties of different designs. Even when the idea of a stopping rule 
has been adopted, there has still been very little discussion of estimation under those 
stopping criteria. Zhang and Hu (2009) , and Bandyopadhyay and De (2009) 



are two 



typical examples. In these two studies, only large scale simulation studies were conducted 
to compare the properties of their designs and to provide information regarding suitable 
sample sizes for their designs. In another example, Moler, Plo, and Miguel (2006)| treated 
the allocation ruled by an urn model as a Robbins-Monro scheme, but the property of the 



stopping rule was still ignored. In addition, Thall and Wathen (2005) compared the CARA 
design to the balanced randomization design, however, the same stopping rule based on 
the balanced randomized design was applied to both designs, which is inappropriate as 
indicated in their paper. 

As mentioned above and also in Hu and Rosenberger (2006)] , the sequential method is 
a natural choice for a CARA design based clinical trial; however, it is rare to find liter- 
ature regarding the application of stopping rules for the sequential estimation procedure 
based on CARA designs, and the effective sample size for a clinical trial with adaptive 
design. The difficulties are mostly due to the adaptive nature of CARA designs, which 
make the classical approach, based on the assumption of independent observation, less 
useful. Besides the adaptive design, the adjustment of the allocation probability based 
on subject's covariate information makes the procedure even more complicated. Hence, 
the asymptotic properties of estimates under randomly stopped CARA experiments, de- 
rived in our paper, are not trivial and cannot follow from their non-random sample size 
counterpart. 

In this paper, a sequential procedure is proposed for estimating treatment effect under 
a general CARA design. Our goal is to estimate the treatment effects, with the minimum 
sample size, such that the estimates satisfy a prescribed precision, and subjects can be 
allocated to the superior treatment without interfering with the the quality and efficiency 
of estimation of treatment effects. The asymptotic properties of sequential estimates are 
obtained under this general CARA design. In addition, we also show that the allocation 
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rule, under the proposed stopping criterion, maintains the same asymptotic properties as 
those obtained in its non-sequential counterpart. In our numerical study, for illustration 
purposes, we adopt the method of Bandyopadhyay et al. (2007)| and use a utility function 
to balance the ethical consideration and the efficiency of the estimate for treatment allo- 
cation. We, then, modify the utility function to vary the tuning parameters sequentially 
depending on the precision of the estimate at every allocation stage such that subjects 
are allocated to a "more adequate" treatment. 

The rest of this paper is organized as follows: A sequential estimation procedure 
for treatment effect is proposed in Section 2. Simulation results are applied to logistic 



models using a modified allocation rule (Bandyopadhyay, Biswas, and Bhattacharya 2007) 
in Section 3. We, then, conclude with discussion in Section 4. Proofs of theorems are 
given in the Appendix. 



2. Sequential Estimation of Treatment Effect 

Let iV TO> fc be the number of subjects assigned to treatment k during the first m assignments 
and N m = {N m ^\, . . . , Nmje). Suppose that {Y mt k, m = 1,2, ... ,k = 1, . . . , K} denotes 
responses of the m-th subject to the fc-th treatment and Y m = {Y m ^, . . . , Y m ,K)- Let £ m 
be the covariates of the m-th subject. Suppose that X 1; X 2 , ... is the sequence of random 
treatment assignments, and X m = (A m l , . . . , X m>K ), A m fe 6 {0, 1}, denotes assignment 
of treatment k to the m-th subject. Then X m ^ = 1 for some k and X/fcLi X m ,k = 1- That 
is, each subject is allocated to one treatment only. Hence, it follows that the response of 
subject m to the treatment k, Y m ^ k , is observed only if X m k = 1. (Note that this implies 
that N m = $Xi*iO 

Define X m = a{X u ...,X m ),y m = a(Y u . . . , Y m ), and Z m = afa, . . . , £J, £ 6 W, 
be the corresponding cr-fields. Let T m = a(X m ,y m , Z m ), then a general CARA design is 
defined as 

"0 m = EyL m \jF m ^.\, £ m ] = 22[X m | X m _i, y m -i, Z m \ 
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Suppose that for each m > 1, the responses and covariate vector satisfy 

E\y m ,k\t] = »k{e h ,t), (i) 

where //&(•, •) are known functions, 14 denotes the covariance matrix based on Equation 
(jlj and 9k G i? p for = 1, . .. ,K. The asymptotic properties of the estimate of 6 = 
(8i, ... , and allocation function under such a general CARA design has been discussed 
in Zhang et al. (2007) The estimation of 9 is the primary goal in a clinical trial. Thus, 
it will be beneficial if treatment effects can be estimated with a certain accuracy using 
a minimum required sample size whilst simultaneously still retaining the good allocation 
properties. Since, in a CARA design, the design at the current stage depends on the past 
history, sequential analysis is the statistical tool of choie. Here a sequential estimation 
procedure is proposed for constructing a confidence set for 9 with a prescribed accuracy, 
and we show that the asymptotic properties of allocation function remain the same as 
their non-sequential counterparts under such a sequential sampling strategy. 

Suppose no prior information about the effects of treatments is available. In order to es- 
timate the treatment effects, at the beginning, we need to assign m (> 0) subjects to each 
treatment using restricted randomization. Hence, when we allocate the m-th subject (m > 
Km ), there are already m — 1 observations, {(X 1; Y i ,^ 1 ), . . . , (X m _ 1; Y m _ 1; ^ m _ 1 )}, col- 
lected. Thus, we assign the m-th subject to the treatment k with probability 

where 9 m -\ is the maximum quasi-likelihood estimate of 9 based on the previous m — 1 
observations and 7r/-(-, •) is the true allocation probability for treatment k and the given 
covariate. Assume further that ^k(9k,$, m ) = Hk(£,' m 9k) for each m > 1. Hence, it follows 
from Equation (1) and V, that the method of generalized linear models (quasi-likelihood) 



can be applied (McCullagh and Nelder 1989). Assume that 9k E C BP is bounded for 



k — 1, . . . , K, and let the parameter space 6 = Ylk=i 



Under the above assumptions (see also Condition A of Zhang et al. (2007) , Theorem 
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2.1), it is proved that as mm(N m} k, k — 1, . . . , K) goes to infinity, 

^(0-0)-^iV(O,V), 



where V = diag{Vi, . . . , Vk}- Based on the asymptotic normality of 6, the sequential 
method is employed for estimating the confidence set of 6 = (0\, . . . , Ok)- Define 

R = {6 e : n(0 - 0)'V-\6 -6)< Cl}, 

where C\ is the constant such that P{x 2 {p ■ K) > C%) < a. The asymptotic normality of 
6 implies that P{6 G R) ~ 1 — a as the sample size becomes large. 

Although large sample results guarantee the performance of estimates and some asymp- 
totic properties of CARA designs, we want to know just how large a sample size is needed 
to guarantee a satisfactory performance in a practical sense. Moreover, no matter how 
high the coverage probability is, the confidence set becomes less useful if the size of the 
confidence set becomes too large. Now, suppose we further require that the maximum 
axis of R is no larger than 25 for some 5 > 0, then the minimum sample size to achieve 
this goal is 

nA^V- 1 ) > S 
Equivalently, the above inequality can be re-written as 

n > C ° A ^ (V) , (2) 

where notations A max (v4) and A min (v4) denote the maximum and minimum eigenvalues of 
matrix A, respectively. Let R$ denote the corresponding confidence ellipsoid for given 5. 
So, once 5 > is specified, the maximum axis of confidence ellipsoid R$ is no greater 
than 25. The constant 5 here is used as a measure of precision of the confidence ellipsoid 
R$. Please refer to Siegmund (1985)| Albert (1966)| and Ghosh and Sen ( 1991 )| for other 



measures of confidence sets. 

If V is known, then the optimal sample size required to construct a confidence ellipsoid 
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Rg with the required maximum axis no greater than 25 is 

n opt = first n such that n > ^ a — -. 

5 2 

Since the variance matrix V is usually unknown, the above optimal sample size is not 
available. Replacing the unknown V in Equation (JiJ with its consistent estimate 6 (to 
be defined later), a stopping rule to construct such a fixed size confidence ellipsoid is 
suggested: 

Tg = first n such that n > a ™ x 

5 2 

= mt{n >n :n> — }, (3) 

where no > Kitlq is the minimum initial sample size and mo is the initial sample size for 
each treatment. Similarly, we then define 

r s = {o e e : n(0 - oyv-\b -o)< c 2 a }. 

It follows from the strong consistency of 0, if V is also a strongly consistent estimate 
of V, then lim^oo P{0 e R$) — 1 — a. That is, Rg is a confidence ellipsoid of 6 with 
coverage probability 1 — a, asymptotically. 

It follows from the definition of Tg that, when the sequential sampling stops, the 
confidence ellipsoid will have its maximum axis no greater than 25. However, it is also 
known that there is no guarantee that 6 will have the same asymptotic distribution if 
we replace the fixed sample size with a random sample size Tg- Although the sequential 
estimation procedure provides a way to control the size of the confidence set by utilizing a 



stopping rule, it is interesting to know whether the asymptotic properties in Zhang et al 



(2007) are still adhered to under such a randomly stopped criterion. 

Suppose that allocation function 7r(-, •) = (tti(-, ■),■■■ ,^k(-, •)) an d satisfies the fol- 
lowing conditions: 



(CI) Ef=i *k = 1 and < u k = E $ [ir k (e, £)] < 1, k = 1, . . . , K. 
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(C2) For fixed tt^O, £) > is a continuous function of 6 and is differentiable with 
respect to 9 such that v k {§) = v h {0) + (0 - 0){dv k /dO)' + o(\\G - for some 

C>o. 

The condition 7r^ > for each k — 1, . . . , K on the allocation function guarantees that sub- 
jects will be allocated to individual treatments, eventually. Thus, this condition also af- 
firms that with probability one the design matrix is non-singular, and the A m i n (V _1 ) > 0, 
asymptotically. Under these conditions, in Theorem [TJ we show that the sequential pro- 
cedure with the stopping rule defined in ^ can guarantee that the size of the maximum 
axis of confidence ellipsoid is no greater than the pre-specified length, while maintain- 
ing the required coverage probability. In addition to classical asymptotic properties of 
sequential confidence set estimation, the asymptotic properties of the allocation function 
under sequential sampling that is based on the CARA design are also proved in Theorem 

m 

Theorem 1 Under some regularity conditions on the link function \x k and Conditions 
(CI) and (C2) for the allocation function v k , for each k, z/sup m ||£ m || < oo ; then the 
proposed sequential estimation with the stopping rule defined in Q) has the following 
properties: 

(i) P(ts < oo) = 1 and lims^ rs/n opt = 1 almost surely. 
When the sampling stops, the estimate of 6 satisfies that 



(ii) 6 Tg — > 8 almost surely as 5 — > 0, y/fs(6 Tg — 6) — > c N(0, V), and lim,5_ s>o P(0 e 
R§) = 1 — a. 

Then, in addition, the average of the stopping rule satisfies that 



(Hi) lim^o E 



1. 



Moreover, for a given allocation function, it is shown that 

(iv) lim^o -~ = v almost surely, 

(v) -> 7r fc (0, £) a.s. as 5 -> 0, k = l,...,K, and 



(vi) Jt s {N ts /t s - u) ^ c N(0, E) ; 



where N TS ^ is the number of subjects assigned to treatment k with covariate £ up to 
rgth subject and N Tg \£ is the total number of subjects with covariate $, up to r&th subject. 
Here v = (ui, ■ ■ ■ ,vk)' an d tt^, k = 1,...,K, depend on the allocation function, and 
E = S x + 2S 2 where S x = diag{v] - v'v and S 2 = X (J* ^(J^)'. 

Theorem [I] (i) states that the sequential sampling will stop eventually, and (ii) and (iii) 
are named asymptotic consistency and efficiency of a sequential confidence estimation 
procedure by Chow and Robbins (1965)] Theorem [I] (iii) means that the average ratio of 



the sequential sample size to optimal sample size converges to 1. This means the proposed 
sequential sampling is efficient in terms of sample size used for constructing a fixed size 
confidence ellipsoid of the parameters of interest. 

Theorem [l] (iv) to (vi) provides the asymptotic properties of the allocation rule under 
the sequential estimation procedure. In particular, Theorem [l] (iv) states that eventually 
the allocation proportion converges to the allocation expectation u, and Theorem [T] (v) 
states that for the given covariate £, the proportion of allocation converges to the "true" 
(unknown) allocation probability with probability one as 5 goes to zero. That is, if 
the conditions in Theorem [T] are satisfied, then under the proposed sequential sampling 
method, the allocation rule maintains the same asymptotic properties as those in its 
non-sequential sampling counterpart. In our simulation study, we have demonstrated our 



procedure using the allocation rule proposed in Bandyopadhyay et al. (2007) Please 



refer to Zhang et al. (2007) for different allocation functions/designs under this general 



framework. 

Remark 1 Note that the proof of the properties of the sequential procedure is not trivial, 
and cannot follow directly from the results of the estimates based on the non-random 
sample size case due to the application of the stopping rule. This can be seen from a 



simple example in\Chow and Teicher (1988) (Chapter 4, Example 1, page 90). Since our 



proof of Theorem^ is based on the last time approach of Chang (1999), some conditions 



on the parameter space can be relaxed. Details are given in the Appendix. 
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2.1 Subset of parameters 

Sometimes, we are only interested in contrasts of parameters. For example, instead of 
estimating individual treatment effects, we may want to estimate differences between 
treatment effects in a clinical trial with multiple treatments. For this purpose, let if be a 
p x h matrix that specifies the contrasts with < Rank(iJ) = h < p. Let 7 = H'6, then 
the asymptotic properties of 6 imply that as n — > 00 



V^(7n-7) iV(0,V 



1)1 



where = H'VH. Let V 7 = H'VH. Then V 7 is a strongly consistent estimate of V 7 . 
Therefore, it follows that n(-j — 7) / V~ 1 (7 — 7) is asymptotically distributed with x 2 {h)- 
Let 

R Sy = {7 G R h : n(j - 7)^(7 - 7) < C 2 an } (4) 

Similarly, we can also construct a confidence ellipsoid of 7 with the length of its maximum 
axis no greater than 2<5 7 . Then the optimal sample size and its corresponding stopping 
time are 

n -y,opt = first n such that n > a ' 7 — — (5) 

7 

and 

T Sl = mt{n >n :n> — }. (6) 

By simple matrix algebra, we have a parallel theorem to Theorem [T] for contrasts of 
parameters. 

Theorem 2 Let H be a p x h matrix with Rank(H) = h < p, and 7 = H'6. Then 
under conditions similar to Theorem [JJ 7 is a strongly consistent estimate of 7 and 
asymptotically normally distributed with covariance matrix V 7 = H'YH. Moreover, the 
sequential procedure with the stopping rule defined in |6p has the following asymptotical 
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properties: 

(i) P{t 5i < oo) = 1 and ]xm s ^ r Sy /n ltOpt = 1 almost surely. 

(ii) *l TS -^7 almost surely as 6 7 -)■ 0, ■ s /rs^('J Ts -j) ->c iV(0,V 7 ) ; and lim^o -P(t € 
R s ) = 1 - a. 



fmj lim 5 ^o-K 



^7, opt 



where Rs y and n 7i0pi are defined in 00 and |3p ; respectively. 

The main difference between the new stopping rule defined in Equation([6]) and the 
previous one is the variance of 7, and this difference in r^ 7 does not affect the allocation 
rule. Therefore, the asymptotic properties of the allocation rule in Theorem [2] follow from 
the same arguments as in the proof of Theorem [TJ In fact, the asymptotic properties of 
the allocation rule remain the same under this stopping rule, and are not re-stated here. 
That is, this sequential estimation procedure allows us to compare treatment effects using 
a contrast estimation method under a CARA design without disturbing the asymptotic 
properties of the allocation function, which is a useful feature in practice. 

Remark 2 Note that the asymptotic properties of the allocation function in Theorem [2] 
will remain the same as those in Theorem^ when <5 7 becomes small. However, intuitively, 
the sequential sample sizes should converge at different rates, depending on the contrasts. 
This property is usually reflected in the second order term of the stopping time and is not 
shown in Theorem® 



3. Numerical Study 

The purpose of the numerical study is to look at the performance of the estimate of the 
treatment effect and the allocation of subjects. In order to apply the sequential confidence 
estimation procedure proposed in Section 2 for K treatments, and treatment allocation 
procedures in Section |3.1| for illustration purposes, we consider a binary response case in 
this study using the logistic model. 
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3.1 Treatment Allocation Rule 

In order to skew the treatment allocation proportion so that the better treatment is 
allocated more often, Bandyopadhyay et al. (2007) | suggests using an utility function 



below. For K treatments, their utility function is defined as 



K 



^.o^+.l-.jE^y^jj, (7) 

where 7Tfc(#,£) is the estimate of TTk(0, £) denoting the estimate of the allocation prob- 
ability for treatment k up to current stage n. For a given £ and the current estimate 
of 0, the optimal allocation rule is to find the vector of probabilities p = {p\, . . . ,px) 
that maximize the utility function above. That is, the design at the (n + l)th stage is to 
allocate the (n + l)th subject to the treatment that maximizes the utility function. 

In the utility function, the first term is in log n scale, which is a log determinant of 
the information matrix. If rj = 0, then the new subject is selected to maximize the Fisher 
information matrix, which is referred to as the piecewise D-optimal design as mentioned 
in Bandyopadhyay et al. (2007) On the other hand, if 77 goes to oo, then the optimal 



value of p is to maximize the relative entropy function, the second term of ([7]), which was 



also raised in Bandyopadhyay and Biswas (2001) Hence, the parameter rj can be used 



to adjust the ethical and efficiency balance. Here we use a utility function to balance the 
needs for estimation precision of treatment effects and the ethical consideration. It leads 
to the (locally) D-optimal design. 

At the beginning of a study, when estimates of treatment effects are not reliable, we 
can improve the precision of the estimation of treatment effects when allocating patients 
via a utility function. Since the estimate of treatment effects becomes stable as the sample 
size becomes large, it is reasonable to move the weight gradually toward the ethical part at 
the later stage of the study. If there is sufficient information on treatment effects, we tend 
to allocate more patients to the better treatment. That is, unlike the two-stage design in 



Bandyopadhyay et al. (2007) , we now have more flexibility to alter the parameters of the 
utility function as sampling goes on such that the needs for estimating treatment effects 
and the ethical consideration can be fulfilled and balanced. 
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The second term in the utility function involves 7Tfc(#, £). Modifying the utility function 



by Bandyopadhyay et al. (2007), 7ik(0,$,) can be defined as follows with K = 2 for 



illustration purposes. 



7n(6U) = J| *' 01 T e02 1 and 7r 2 (0,£) = l-7n(0,£), 



where J(t) can be any symmetric function. 7tk{Q,$,) can vary sequentially through T n at 
each allocation. Both T n and rj can serve as tuning parameters between efficiency and 
ethics and be random depending on the estimate precision, which can be a function of 
standard deviation of the treatment effect estimate based on cumulative observations 
up to nth subject. Please note that T n and rj are also tuned by a new covariate £ 



of the [n + l)th subject. Through numerical studies, Bandyopadhyay et al. (2007) 



provides tables with estimates of allocation proportions for several 77s and given T n for two 



stage CARA designs. In Section we present numerical results with some suggestions 
for tuning both parameters of T n and 77, and the proposed sequential procedure is also 
evaluated with its correct allocation probability. 

3.2 Application to Logistic Models 

Suppose Yfc = 1(0) denotes a response variable with success (failure) from a subject 
assigned to treatment k for k — 1, . . . , K. Let [Xki.dk, £) = E[Yk = 1|£], and 9k = (ak, 01). 
Assume that 

iogit(M4, 0) = 0L k + en, k = 1, . . . , k. (8) 

Since the covariate vector can be redefined as (1,£)', without loss of generality, we assume 
that aik = 0, k = 1, . . . , K. Suppose there are m initial samples for each treatment and 
assume that we are at the mth stage with m > Ktuq. Then the MLE Q m ,k of 8k, for 
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k — 1, . . . , if, is the one that maximizes 



(9) 



i=l 



where /ij^ = ^k{@k,£,i)- It follows that the conditional Fisher information matrix, for 
given is 

/fc(0fc|^) = ^(0fc,£)(i-M^^))« / - 

Let J ni fc = Y17=i Xi,kIk(Qn,k\$i) be the estimate of Ik for all /c. Then for a if treatments 
problem, for example, the new design is chosen such that the Fisher information matrix 
J n+ i is maximized, if we assume i] = 0, where I n+ \ = I n + i n+1 , 



/ 



jn+l — 



'•• 

o '•. o 

PK^jJctjtj ) 



\ 



and Aj,fe = /ti,fc(l - jii,k) for i = 1, . . . , n, j — n + 1, and A; = 1, . . . , if . 



(10) 



3.2.1 Parameter Setup and Simulation Results 

Suppose that if = 2; that is, we assume logistic models with binary responses, two 
treatments and one continuous covariate £. In the logistic models, we assume equal 
intercepts for both treatments (ai,a 2 ) = (0.1,0.1) and regression coefficients (0x,0|) = 
(—1,1). The covariate is generated from a mixed normal distribution with means 2 
& — 2 and equal variance 1 with respective probability 0.5. Since the treatment effect is 
defined as a function of differences of intercepts and regression coefficients between the 
two treatments, we apply the stopping rule for the contrasts of parameters, 7 = H 8, 



given in Section 2.1. Thus, the transpose of the contrast H is defined as a matrix with 



its first row (1, —1, 0, 0) and its second row (0, 0, 1, —1), and the vector of parameters 6 
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is (a 1 ,a 2 ,9* 1 ,9* 2 )' ■ 

Precision 5 is assumed 0.3 and initial sample size for each treatment, tuq, is assumed 
as 5, 10, and 15. Several combinations of tuning parameters To and rj are assumed: 0.5, 1 
and 2 for T and 0, 0.1 and 1 for rj. Both fixed and varying tuning parameters, T and rj, 
are considered; that is, To and rj are fixed until the study stops, or vary whenever a new 
observation is added in a way that To is proportional and rj is inversely proportional to 
the standard deviation of the treatment effect for a given covariate of a new observation. 
Findings from simulation studies are as follows: 

As rj gets larger, stopping time gets larger but its increase is reduced as initial sample 
size gets larger. Varying rj does not give results that are significantly different from fixed 
rj unless T varies as well. Stopping time is very unstable when initial sample size m is 
small, such as 5, due to unstable regression coefficient estimates at the beginning stage if 
To is 0.5 or 2. As initial sample size gets larger, stopping time gets earlier and its variation 
gets smaller. The coverage probabilities of treatment differences are reasonably close to 
the nominal level 0.95 and become closer to 0.95 as the initial sample size mo gets larger. 
Based on these findings, it is recommended that, in order to obtain earlier stable stopping 
time with a given precision satisfied, the initial sample size should not be too small. 

When rj = 0, correct treatment allocation probabilities are about 0.5, since it is equiv- 
alent to randomized allocation as there is no ethical consideration in the utility function. 
As rj gets larger, correct treatment allocation gets better with similar performance for 
positive rj. This confirms that rj plays a role as a tuning parameter for ethical consider- 
ation and a, small, nonzero rj is sufficient for correct allocation. Large correct allocation 
probabilities for positive rj, in Table [T], illustrate that our sequential procedure under the 
CARA designs successfully implements the idea of CARA designs, with more allocation 
to better treatment, for the non-sequential counterpart. 

For positive rj, correct allocation is high and close to 0.9 when To = 1 or when initial 
To=0.5 with varying To. However, it is lower when To = 0.5 with fixed To compared to 
varying T or when T = 2 with varying T compared to fixed T . If T varies depending on 
treatment effect variation, T becomes larger than the initial T . Thus, varying small T 
gives better allocation due to the reasonable tuning size of T , however, varying large T 
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gives worse allocation due to a too liberal tuning of T . This emphasizes the importance 
of selecting a reasonably sized T . 

5. Discussion 

In this paper, we propose a sequential estimation scheme for the CARA design in clinical 
trials. In this sequential estimation procedure, allocation function and design depend not 
only on previously collected information and sequential estimates of treatment effects, 
but also on the covariate information of individual subjects. The proposed sequential 
estimation is based on the martingale estimating equation, which differs from some clas- 
sical sequential methods that rely on independent observations. The stopping rule used 
here depends on the observed Fisher information, which guarantees the precision of the 
estimates of treatment effects, and is novel in the CARA design based clinical trials. The 
procedure discussed here is rather general and can be applied to other generalized linear 
models. We demonstrate our method using some logistic regression models under a two- 
treatment case. The theorems derived in this work are for general allocation rules, which 
require only mild conditions on the allocation function. It will be possible to explore more 
if a specific allocation rule is available. 

As shown in Figure 1, it is very difficult to allocate the most suitable treatment 
for subjects in the vicinity of the intersection of lines of two treatment effects. This is 
especially the case, when the difference in slopes of treatments is small. Thus, instead of 
a strictly concave function as we have used in our numerical study, some concave function 
with a plateau may be considered. According to our experience based on the numerical 
studies, the large changes adopted in To during the sequential procedure may lower the 
correct allocation probability. Hence, from a practical viewpoint, a reasonably sized To 
should be chosen in the utility function, considering all factors of a clinical trial, such as 
the distributions of covariates, the intersection point of the two treatment models and 
among others. In other words, if we have some prior information on the targeted sub- 
populations, then it may help to decide To. This leads to possible future research, where 
Bayesian statistical tools might play an important role. 
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Appendix A 



To apply sequential sampling to CARA designs, we need to extend the results of Anscombe's 



theorem to daptive design. From the proof of Anscombe's theorem (see Woodroofe 1982 
page 11), the i.i.d. assumption is not necessary; in fact, it only requires the sequence of 
partial sum to satisfy the u.c.i.p. condition. This is sufficient for applying Anscombe's 
theorem. The lemma below shows that the sequence of the partial sum of martingale 
differences also satisfies the u.c.i.p. condition. The arguments below are similar to those 
of Woodroofe (1982)| example 1.8. 



Lemma 1 Let X\,X2,... be a sequence of martingale differences with respect to a se- 
quence of increasing o-field Ti for i = 0, 1, . . .; that is, ElX^J 7 ^^ = for all i > 1. 
Suppose that there is a constant M such that E[\\Xi\\ 2 \ < M < oo for all i. Then 
Y n = = ^ILi^i/V^ satisfies the u.c.i.p. condition. 

Proof of Lemma [T] 



For all k, n > 1, \S* n+k - S*\ < ^n~\S n+k - S n \ + [1 + \S*\, where S n = Zti X i- If 

e, 5 > and k < n5, then the second term on the right hand side is bounded by C(S)\S^\, 
where C{8) = 1 - (1 + 5)- 1 ' 2 and 

P (C(5)\S:\ >1)<P (\S* n \ > as 5^ 0, 

since \S^\ is stochastically bounded. Because Xj's are martingale differences, instead 



of Komogorov's inequality, we apply the Hajek-Reney inequality (see Chow and Teicher 



1988, Theorem 8 (iii), page 247). Then it is shown that 

r, / ,n n I ^\ M \ ^ , 48 M 

P max \S n+k - S n \ > — — < — 5- nbM = — — , 
\k<ns 2 J \ne 2 J ^2 



which is independent of n and goes to zero as 5 — > 0. Therefore, 5**, n > 1, satisfies the 
u.c.i.p. condition. 
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A.l Last time for generalized linear models 



We can apply the last time method for martingale differences as that in Chang (1999) in 
our proof of asymptotic efficiency 

Let af = Var(yj|£j) = <j 2 v(p k ). Then for fixed p > and for each k, let's define a last 
time variable 

L k)P = sup{n > 1 : (0 - 9 k )'£ n {9) > 3 9 e d@ k)P }, 
where £ n , k (9) = J2i=i 9(^k)(i{Yi,k ~ and g(t) = fi k /u(ii k ), provided that the 



derivative of ji k exists. Then it follows from Chang (1999) 



n > L kp =>• 9 k exists and 9 k G dS k ^ p . 

Moreover, he proved that under some regularity conditions of covariate £'s, EL kyP < oo 
for all k. This implies that if we define the last time L p = max{L lp , . . . , Lk, p }, then 
n > L p implies that 9 e & p C 0, where Q p = Ylk=i ®fc,p- 

Note that Chang (1999)| defined last times for generalized linear models, and it is clear 



that for each k e {1, . . . , K}, Equation ([TJ is a special case of Chang (1999) In Zhang 



et al. (2007) , they assume the estimate of 9 exists in a compact set when sample size n is 
sufficiently large. By the last time defined above, since we can choose sufficiently small p 
such that for sufficiently large n, 8 will fall into a compact neighborhood of 6. (Hence, the 



assumption of Zhang et al. (2007) can be relaxed. See Chang (1999) for further details). 

Although the treatment allocation for each subject is affected by previous observed 
responses, it is clear that the estimate of 9 k , k — 1, . . . , K, is still calculated separately for 
a given sample under the general CARA design. Thus the estimation procedure of 6^'s 
for all different fc's can be treated as estimating K adaptive regression models, separately. 
That is, for given observations, the estimation of 9 k , for each k, does not depend on 
estimates of other 9i, I ^ k. That is, if for k = 1, . . . , K, let 

U mtk = collection of observations {Yj^ k , £j with Xj }k — 1 : j — 1, . . . , m}, 

then the estimate of 9 k , say 9 k , is calculated based on observations in U mjk only. Thus, the 



20 



property of 9 k is the same as the MLE of a stochastic regression model. The sequential 



estimate under the adaptive design has been studied by some authors. For example, Lai 



and Wei (1982) studied its properties under a linear regression setup with a general adap- 



tive design assumption, while Chen et al. (1999), and Chang (2001) discussed estimation 
under a generalized linear model setup. Their results are applied in the proof of Theorem 
[IJ (In these three papers, they only assume that the design is adaptive, but no particular 
design scheme is assumed. Hence, their methods are rather general and can be applied to 
our case under some specific allocation rules.) 



Proof of Theorem [T] 



It is proved in Zhang et al. (2007) that V is a strongly consistent estimate of V. This 
implies that A max (V) and A min (V) are also strongly consistent estimates of A max (V) and 
Amin(V), respectively. Thus, if sup m ||£ m || < oo, then by Chow and Robbins (1965) 



Lemma 1, it is shown that P(rg < oo) = 1 and \im d ^ Ts/n opt = 1 with probability one 
and thus the proof of (i) is completed. 

The highlight of the proof of (ii) is the asymptotic normality under the random sample 
size. This property can usually be obtained by applying Anscombe's Theorem, which relies 



on the u.c.i.p. property (see Woodroofe 1982). However, under the adaptive design, some 
modification is required. Thus, here we apply its modification, which is stated as Lemma 



The asymptotic normality of under the adaptive design has been established by 



Zhang et al. (2007) (see also Lai and Wei 1982 Chang 1999 and Chen et al. 1999) 



Following from the results of (i), to prove (ii), it suffices to prove that the sequence of nor- 
malized random sums {y/n(0 n — 0),n > 1} satisfies the u.c.i.p condition (see 



Woodroofe 



1982 for its definition). From Equation (2.4) of Zhang et al. (2007), we have, with prob- 
ability one, 9 njk - 0k = n' 1 J2m=i X rn,khk(Y m ,k,^ m )(i + o(l)) + o(ra _1/2 ), where function 
h k satisfies E[h k (Y k , = 0. 

It follows from Zhang et al. (2007) that we have 



0k) = n 1/2 ^ X m<k hk(Y mjk , g m ) + o(l)n 1/2 2J X m<k h k (Y mtk ,€ m ) + o(l) 



771=1 



m=l 
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almost surely. It is clear that from the definition of u.c.i.p., the property of convergence 
with probability one will imply the property of u.c.i.p. Moreover, it follows from Lemma 



1.4 of Woodroofe (1982) , if both U n and W n are u.c.i.p., then U n + W n is also u.c.i.p. By 
applying LemmaJIJ we have that {rT 1 ! 2 Ylm=i X m ,khk{Y m> k, £ m ) : n > 1} is u.c.i.p., which 



together with Lemma 1.4 of Woodroofe (1982) implies that {\/n(O n ,k~ Ok) : n > 1} satisfies 



the u.c.i.p. condition. Hence, applying Anscombe's theorem (Theorem 1.4 of Woodroofe 
see also Theorem 4.5.3 of 



(1982) 



Govindarajulu (2004) ), the asymptotic normality of 6 



n, k 



remains for each k, and it completes the proof of (ii). 

It follows from (i), that to prove (iii), it suffices to prove that {5t$ : 5 G (0, 1)} is 



uniformly integrable. As discussed in Section A.l 



n > L p n G &. 



Since is compact, this implies that for n > L p , A max (Vfc) < sup 06g) A max (V fc (6')) < C Amax 
for some CA max > 0. Let V k = diag{0, . . . ,V k , . . . , 0} for k = 1, . . . , K, where O denotes 
the p x p matrix of O's. Then V = Ylk=i ^- Thus, A max (V) < Y.k=i A max(V r A : ) < KC Amax . 
Hence, for n > L p , the stopping time t$ is bounded. Moreover, by applying the last time 



lemma for martingale differences in Chang (1999) , we have E[L p ] < oo. This implies that 



{5ts : 5 G (0, 1)} is uniformly integrable and the proof of (iii) is completed. 

The proofs of (iv) and (v) follow directly from Theorem 2.1, Equation (2.6) and Theo- 



rem 2.2, Equation (2.8) of Zhang et al. (2007) and the strong consistency of t$, so they are 
omitted here. To prove (vi), we only need to show that {n _1 / 2 (N n — nu) : n = 1, 2, • • • .} 



is u.c.i.p. From (A. 6) of Zhang et al. (2007) , we have, with probability one 



n K 



N n - nu = M n + (1 + o(l)) Y, Yl ^{du/dOu)' + o(n^ 2 ) 



i=l k=l 



where M n = (AM n) i, . . . , AM„^) and T n = (AT ni i, . . . , AT n ^) are mult i- dimensional 
martingale sequences with bounded martingale differences; that is, AM n ^ < 1 and 
|| AT n fc || < oo, where A denotes the operand of a sequence {z n }; that is, Az n = z n — z n _i. 
(Here only the moment condition of martingale differences is required for our purpose. 
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Thus, other properties of M n and T„ are omitted. See Zhang et al. (2007) for further 
details.) Therefore, with probability one, 



n- 1/2 {N n - nv) = n- 1/2 M n +(l + o(l)K 1/2 

i=i 



K Ti 



m 

k=l 



oil) 



Similarly, by applying Lemma [T] again and arguments similar to Woodroofe (1982), Ex- 



ample 1.8, we have {n 1//2 (N n — nu) : n — 1, 2, • • • .} is u.c.i.p. This completes the proof 
of Theorem [I] (vi). 

Proof of Theorem [2] 

By the definition of 7 = H'B and rank(H) = h < p, it easy to see that 7 is a strongly 
consistent estimate of 7 and is asymptotically normally distributed with covariance matrix 
Vy. Moreover, it is clear that {y/n( , y n — 7) : n — 1, 2, ■ • • .} is u.c.i.p., since if is a non- 
random matrix. Thus, the proofs of Theorem|2](i) and (ii) follow from the same arguments 
as in the proofs of Theorem [l] (i) and (ii). To prove (iii), we first note that by simple 
matrix algebra, we have 

Amax(V 7 ) = A max (H'VH) < A max (H'H) ■ A max (V). 

Since H is a pre-fixed non-random matrix, A mSiX (H'H)(= \h, say) is a constant. Now, let 

• t r x ^ \ C a 7 A max (V 7 ) 

r 5l = mi{n >n :n> X H '-— }. 

7 

Then by definition, we have ts^ < f$ almost everywhere. Moreover, again it can be 
shown by the same arguments above that {d 2 f$ i : d G (0,1)} is uniformly integrable. 
This implies that {d 2 T$ i : d G (0, 1)} is uniformly integrable and thus the proof of (iii) of 
Theorem [2] is completed. 
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Figure 1: Sequential CARA design with two treatment slopes and two covariate popula- 
tions. 
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Table 1: Mean (M) and standard deviation (SD) of stopping time (r$ ), coverage proba- 
bility (CP) and correct allocation probability (CAP) of sequential 95% confidence interval 



estimation with 5 = 0.3. T ov and r] V indicate whether T and rj vary or not 
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